The mitochondrial genomes of the Geometroidea (Lepidoptera) and their phylogenetic implications

Abstract The Geometroidea is a large superfamily of Lepidoptera in species composition and contains numerous economically important pest species that cause great loss in crop and forest production. However, understanding of mitogenomes remains limited due to relatively fewer mitogenomes previously reported for this megadiverse group. Here, we sequenced and annotated nine mitogenomes for Geometridae and further analyzed the mitogenomic evolution and phylogeny of the whole superfamily. All nine mitogenomes contained 37 mitochondrial genes typical in insects, and gene organization was conserved except for Somatina indicataria. In S. indicataria, the positions of two tRNAs were rearranged. The trnR was located before trnA instead of after trnA typical in Lepidoptera, whereas the trnE was detected rarely on the minority strand (N‐strand). This trnR‐trnA‐trnN‐trnS1‐ trnE ‐ trnF newly recognized in S. indicataria represents the first gene rearrangement reported for Geometroidea and is also unique in Lepidoptera. Besides, nucleotide composition analyses showed little heterogeneity among the four geometrid subfamilies involved herein, and overall, nad6 and atp8 have higher nucleotide diversity and Ka/Ks rate in Geometridae. In addition, the taxonomic assignments of the nine species, historically defined by morphological studies, were confirmed by various phylogenetic analyses based on the hitherto most extensive mitogenomic sampling in Geometroidea.

. The mitogenome is characterized by a series of features such as cellular abundance, absence of introns, and a lack of extensive recombination, and thus, they represent one of the important molecular markers, such as the standard cox1 barcode sequence, used in studies on species identification and population genetics of insects, especially for the megadiverse Lepidoptera (Hajibabaei et al., 2006;Hebert et al., 2003Hebert et al., , 2004. In recent years, with the decline of sequencing cost, increasing numbers of the whole mitogenomes have been sequenced and widely used in not only species identification and delimitation but also phylogeny and population genetics of the Lepidoptera and other insect groups (e.g., Du et al., 2019;Timmermans et al., 2014;Yang et al., 2015;. In addition, mitochondrial gene arrangement also represents one kind of important information to infer evolutionary relationships of insects. For instance, the gene arrangement of trnM-trnI-trnQ (the gene underlined is located on the minority strand) is regarded as a synapomorphy for Ditrysia in contrast to some other groups of Lepidoptera such as Adeloidea and Nepticuloidea and ancestral insect orders with the trnM-trnQ-trnI instead (Timmermans et al., 2014). However, although Lepidoptera is one of the species-rich orders in insects, gene rearrangement events have been less reported for this group in comparison with some other orders especially the Hemiptera (e.g., Thao et al., 2004) and Hymenoptera (e.g., Tang et al., 2019) albeit with relatively lower species diversity.
The Geometroidea is one of the largest superfamilies in Lepidoptera and includes more than 24,000 described extant species (van Nieukerken et al., 2011). As leaf feeders, they feed on multiple kinds of typically woody plants, thus often causing a huge loss in agricultural and forest production Mitter et al., 2017). Three families, i.e., Geometridae, Uraniidae, and Sematuridae, had been defined for Geometroidea before (Mitter et al., 2017). Later, molecular evidence strongly suggested the inclusion of Epicopeiidae historically from Drepanoidea Regier et al., 2013;, and a new family Pseudobistonidae (Rajaei et al., 2015;Wang et al., 2019). To date, the relationship among the five families has been recovered as ((Uraniidae, Geometridae), (Sematuridae, (Epicopeiidae, Pseudobistonidae))) by most of the previous studies, but this topology needs to be confirmed because the clade consisting of Sematuridae, Epicopeiidae, and Pseudobistonidae has been either lowly supported (Rajaei et al., 2015;Wang et al., 2019) or sparsely sampled (Murillo-Ramos et al., 2019) in previous studies.
In Geometroidea, mitogenomes of approximately 27 species from three families have been sequenced to date (GenBank, November 2021). This number is obviously disproportional relative to the huge species diversity of this superfamily. Moreover, among the three families, the reported mitogenomes of both the Epicopeiidae and Uraniidae were represented by one species. Given their wide application in the molecular systematics of insects, this situation will hinder the progress of investigating Geometroidea systematics using mitogenome data. Based on the existing mitogenomes, comparative analysis among geometroid members or/and deep phylogenetic analyses (e.g., Du et al., 2019;Yang et al., 2013; was conducted, which greatly further our understanding of phylogeny of this superfamily and related groups. In addition, the mitogenome sequences were also used to infer the phylogenetic relationships of three closely-related Biston species in Geometridae, suggesting the existence of the budding speciation in these species (Cheng et al., 2017). In terms of gene arrangement, all the mitochondrial genomes published available show identical gene organization that is typical in Lepidoptera, and no gene rearrangement events have been reported for Geometroidea to date.
In this study, the mitogenomes of nine additional geometrid species were sequenced, annotated, and comparatively analyzed, aiming to increase the reported mitogenome diversity of Geometroidea and to improve our understanding of mitogenome evolution in this superfamily. Also, these data can provide mitogenome data for other studies on molecular systematics of Geometroidea. Among the nine species, Somatina indicataria showed a gene rearrangement of trnR-trnA-trnN-trnS1-trnE-trnF relative to the trnA-trnR-trnN-trnS1-trnE-trnF typical in Lepidoptera, which represents the first gene rearrangement reported for Geometroidea and is also unique in Lepidoptera.

| Samples, DNA extraction, and mitogenome sequencing
Adult moths were collected by light trap, at Mountain Jigongshan and Lushan country of Henan Province in China, from July to August 2020. Each specimen was identified through morphology and by blasting the standard mitochondrial cox1 barcode on the GenBank database. After identification, nine species from Geometridae of Geometroidea were selected, of which six from Ennominae, two from Geometrinae, and one from Sterrhinae, mainly because of the high species diversity of Ennominae and lack of existing mitogenomes of Geometrinae and Sterrhinae. Detailed specimen information is shown in Table S1, and voucher specimens are deposited in the Biology Laboratory of Zhoukou Normal University, China. In phylogenetic analyses, two other mitogenome sequences were retrieved from transcriptomes of Mania lunus (SRR1695439) and Calledapteryx dryopterata (SRR1021601) available on GenBank, which represented the Sematuridae and Uraniidae, respectively, together with other available mitogenomes to perform phylogenetic analyses of the Geometroidea.
Total genomic DNA was extracted from thoracic tissue isolated from a single specimen using DNeasy tissue kit (Qiagen, Germany), following the manufacturer's instructions. Nine libraries (each for one species) were constructed with TruSeq DNA PCR-Free Sample Preparation Kit (Illumina, United States), and sequencing was conducted using an Illumina HiSeq 2500 platform with a strategy of 150 paired-ends.
Next, we assembled mitogenome from clean paired reads using the Geneious R11 (Kearse et al., 2012). In this analysis, the "map to reference" strategy was selected to map all cleaned reads to an "anchor" of standard mitochondrial cox1 barcoding sequence that was amplified earlier using insect primer pair Lco1490 (F) and Hco2198 (R) (Folmer et al., 1994). After iteration up to 100 times with custom sensitivity, a target contig sequence with high coverage was generated. Lastly, the MEGA X (Kumar et al., 2018) was used to check the beginning and end of the contig sequence to circularize a complete mitochondrial genome after deleting the overlapping sequence.
The mitogenome sequence was annotated using MITOS2 webserver (Donath et al., 2019) with invertebrate genetic code. Gene boundaries were confirmed by aligning the gene sequence of the new mitogenome with that of previously reported geometrid mitogenomes available on GenBank with MEGA X (Kumar et al., 2018). The circular maps of the nine mitogenomes generated in this study were comparatively present using the CGView Comparison Tool (Grant et al., 2012). In addition, two species, Mania lunus and Calledapteryx dryopterata, belonging to Sematuridae and Uraniidae, respectively, were added to the phylogenetic analyses of this study. Mitogenomes of the two species were assembled using the same methods with that of the nine species, from their transcriptomes deposited on GenBank (accession numbers SRR1695439 and SRR1021601).

| Sequence alignment and analyses
A total of 38 mitogenomes of Geometroidea were compiled and analyzed, including nine newly sequenced in the present study, two retrieved from transcriptomes publicly published, and 27 downloaded from GenBank. In addition, mitogenomes of 13 species from Noctuoidea, Bombycoidea, Lasiocampoide, Drepanoidea, and Mimallonoidea that represent the close relatives of the Geometroidea were selected as outgroup sequences in phylogenetic analyses (Table 1).
Among 37 mitochondrial genes, 13 PCGs were individually aligned using the MUSCLE method in the TranslatorX online platform (Abascal et al., 2010) after the sequences were translated with an invertebrate genetic code. Two rRNAs and 22 tRNAs were independently aligned with Q-INS-i algorithm as implemented in the MAFFT online platform (Katoh et al., 2019). Further, the aligned tRNA and rRNA sequences were filtered using ClipKIT (Steenwyk et al., 2020) to delete ambiguously aligned sites with the kpic-gappy algorithm.
Nucleotide composition was calculated using the MEGA X  (Perna & Kocher, 1995). The DAMBE 5.3.74 (Xia, 2013;Xia et al., 2003) was used to conduct tests of substitutional saturation of different data partitions based on the Iss (i.e., index of substitutional saturation) statistics. For this method, if Iss is positively smaller than Iss.c (critical Iss), the indicated sequences may have experienced little substitutional saturation (Xia & Lemey, 2009). Nucleotide diversity and the ratio of nonsynonymous substitution (Ka) to synonymous substitution (Ks) for PCGs were calculated using DNASP 5.0 (Librado & Rozas, 2009). The effective number of codon (ENC) was calculated using CodonW 1.4.2 (Peden, 2000).

| Phylogenetic analyses
To test the phylogenetic implication of the eleven newly generated mitogenomes, various phylogenetic analyses were performed based on the five following datasets: (1)  Maximum likelihood (ML) analyses were conducted using IQ-TREE 2.0.4 (Nguyen et al., 2015) under the partitioning schemes and corresponding substitution models (Tables S2 and S3) determined by ModelFinder (Kalyaanamoorthy et al., 2017). Branch supports (BS) were calculated using 1000 ultrafast bootstrap replicates (Hoang et al., 2018). Bayesian inference (BI) analyses were performed with MrBayes 3.2.6 (Ronquist et al., 2012) with the partitioned models (Tables S4 and S5) determined by PartitionFinder 2.1.1 (Lanfear et al., 2017). Twelve processors were used to perform two independent runs each with six chains (five heated and one cold) simultaneously for at least 500,000 generations sampled every 100 generations.
Convergences were considered to be reached when the estimated sample size (ESS) value was above 200 established by Tracer 1.7 (Rambaut et al., 2018) and the potential scale reduction factor (PSRF) approached 1.0 (Ronquist et al., 2012). The first 25% of samples were discarded as burn-in and the remaining trees were used to calculate posterior probabilities (PP) in a 50% majority-rule consensus tree.

| General mitogenome feature and gene rearrangement
Eight complete and one nearly complete mitogenomes were generated and annotated for nine geometrid species, which increased the reported mitogenome diversity, especially for the Geometrinae and Sterrhinae. In the nearly complete genome (P. rufofasciata), we failed to assemble the partial sequences of the control region that is characterized by highly biased base composition. The eight completely sequenced mitogenomes ranged from 15,250 bp (M. senilis) TA B L E 1 The species used in phylogenetic analyses. of RNAs were not assembled, but the 13 PCGs were completely annotated and used only in subsequent phylogenetic analyses.
All mitogenomes contained 37 mitochondrial genes typical in insects (Figure 1), and these 37 genes, except for S. indicataria, showed identical gene organization to other reported geometrid mitogenomes, which are also typical of Lepidoptera (Cameron, 2014;Wu et al., 2016). In the mitogenome of S. indicataria, the positions of two tRNAs were arranged. The trnR was located before trnA instead of after trnA typical in Lepidoptera, whereas the trnE was translocated from the routinely recognized majority strand (J-strand) to the minority strand (N-strand). On the other hand, two long intergenic sequences (121 bp and 61 bp) were present before and after trnR, respectively, which were also important features distinct from other reported geometrid mitogenomes. To compare mitogenome evolution, mitochondrial gene rearrangement events previously reported for Lepidoptera were summarized and illustrated in Figure 2.
Comparative analysis showed that two rearrangement clusters can be recognized in this order. One includes three tRNAs of trnM, trnI, and trnQ. The gene arrangement trnM-trnI-trnQ is recognized in most lepidopteran members, in contrast to the trnI-trnQ-trnM in some nonditrysian lineages of Lepidoptera such as the Hepialoidea (Cao et al., 2012). Another is the gene cluster including six tRNAs between nad3 and nad5 genes. In this gene cluster, eleven kinds of gene rearrangements have been reported across seven superfamilies. In the S. indicataria, the trnR is located after trnA, similar to the only Parasa consocia of Limacodidae in Lepidoptera, whereas the trnE was detected rarely on the N-strand. To confirm this result, we had methodologically reassembled the mitogenome from F I G U R E 1 Circular diagram of the nine mitogenomes sequenced in this study. Different color is marked to show the nucleotide identity of BLAST hits relative to the reference mitogenome of Somatina indicataria at the outer circle.
the high-throughput sequencing data using Geneious R11 or other software. Moreover, before sequencing, the library was constructed using a single specimen of S. indicataria. Overall, the trnR-trnA-trnN-trnS1-trnE-trnF recognized in this study represents the first gene rearrangement event reported for Geometroidea and is also unique in Lepidoptera, which broadens our understanding of gene rearrangement in Geometroidea and Lepidoptera.
Among the four subfamilies of Geometridae (Figure 3a), the A + T contents ranged from 80.21% (Larentiinae) to 82.04% (Sterrhinae), showing little heterogeneity in nucleotide composition, which is in contrast to some insect groups generally at the same taxonomic levels (Liu et al., 2018;Nie et al., 2020;Song et al., 2016;Tang et al., 2019;Yang et al., 2018). Among the three codon positions within the 13 PCGs, the lowest A + T content was found for the second codon F I G U R E 2 Gene arrangements of reported lepidopteran mitogenomes relative to ancestral insect mitogenome. Red rectangles indicate the two gene clusters with gene rearrangement. The gene with underline is located on the N-strand. The taxon and its classification with gene rearrangement reported in this study are marked in bold.

TA B L E 2
Nucleotide composition of nine newly determined mitogenomes for Geometridae.   and Cimicomorpha of Hemiptera (Yang et al., 2018). Overall, rRNAs showed a higher A + T content than PCGs and tRNAs. The AT skew and GC skew are commonly used for evaluating the nucleotide composition of insect mitogenomes (Perna & Kocher, 1995;Wei et al., 2010). In Geometridae, negligible AT skews and negative GC skews were recognized ( Figure 3b, Table S7), and four geometrid subfamilies con- indicates the neutral codon usage (Wright, 1990). In the reported mitogenomes of Geometridae, the ENC values ( Figure 3c) ranged from 30.4 to 35.53 and have almost no difference among the four subfamilies involved in this study but overall exhibiting codon usage bias to some extent. Moreover, the positive correlation between the ENC and GC3s (Figure 3d) indicates that the genomic G + C content is a significant factor in determining codon bias among geometrid species (Hershberg & Petrov, 2008;Plotkin & Kudla, 2011).

| Mitochondrial gene variation of Geometridae
Nucleotide diversity is commonly used for identifying regions with high nucleotide divergence and could provide guidelines for selecting species-or group-specific markers used in molecular evolutionary studies, especially for taxa with high morphological similarity

| Phylogenetic analyses of Geometroidea
Tests of substitution saturation (Table 3)   rate and might contain phylogenetic noise information (Owen et al., 2015). Thus, in subsequent phylogenetic analyses, five datasets associated with the inclusion and exclusion of the third coding positions and RNA sequences were considered to test the stability of topologies.
According to recent molecular investigations Rajaei et al., 2015;Regier et al., 2013;Wang et al., 2019;, five families are included in Geometroidea. The present study sampled four families, including the Uraniidae represented in the mitogenome-based phylogenetic investigation for the first time.
Based on the hitherto most extensive mitogenomic sampling, our various resulting trees (Figures 5 and 6 , and PCGAA datasets, respectively. The "*" on node represents bootstrap values ≥90 for all datasets. The "-" represents an unrecovered node in ML tree of the corresponding dataset. further investigation is necessary based on extensive sampling of these families (Mitter et al., 2017).
In Geometridae, eight subfamilies are currently recognized The nine mitogenomes sequenced in the present study represented nine species of three subfamilies of the Geometridae, of which one belongs to Sterrhinae, two from Geometrinae, and the remaining six from Ennominae. Their taxonomic assignments were confirmed using mitogenome evidence for the first time, which provide support for previous morphological studies Jiang et al., 2011Jiang et al., , 2012Jiang et al., , 2014Kuzmin & Beljaev, 2021;Sihvonen & Kaila, 2004;Walia, 2015).

CO N FLI C T O F I NTE R E S T S TATE M E NT
The authors declare no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
All mitogenome sequences generated in this study were deposited in the GenBank under accession numbers MZ902335-MZ902343.

F I G U R E 6
Bayesian tree inferred from MrBayes method based on PCG123R dataset. The species with newly sequenced mitogenome is emphasized in bold. Numbers separated by slash (/) on node represent the posterior probabilities based on the PCG123R, PCG123, PCG12R, PCG12, and PCGAA datasets, respectively. The "*" on node represents posterior probabilities ≥0.95 for all datasets. The "-" represents unrecovered node in Bayesian tree of corresponding dataset.